[SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3 by srowen · Pull Request #22921 · apache/spark

srowen · 2018-11-01T14:19:15Z

What changes were proposed in this pull request?

Remove some AccumulableInfo .apply() methods
Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy
Remove toDegrees/toRadians in favor of degrees/radians (SparkR: only deprecated)
Remove approxCountDistinct in favor of approx_count_distinct (SparkR: only deprecated)
Remove unused Python StorageLevel constants
Remove Dataset unionAll in favor of union
Remove unused multiclass option in libsvm parsing
Remove references to deprecated spark configs like spark.yarn.am.port
Remove TaskContext.isRunningLocally
Remove ShuffleMetrics.shuffle* methods
Remove BaseReadWrite.context in favor of session
Remove Column.!== in favor of =!=
Remove Dataset.explode
Remove Dataset.registerTempTable
Remove SQLContext.getOrCreate, setActive, clearActive, constructors

Not touched yet

everything else in MLLib
HiveContext
Anything deprecated more recently than 2.0.0, generally

How was this patch tested?

Existing tests

srowen · 2018-11-01T14:20:39Z

...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala

This was just a bug fix I spotted

srowen · 2018-11-01T14:21:09Z

R/pkg/R/generics.R

@felixcheung might want to check if I'm handling these R changes correctly

my concern is that these are breaking changes in a version without having them deprecated first...
could we leave the old one to redirect and add .Deprecate?

I think my comment didn't get connected to this one -- @felixcheung what do you think about the argument that this almost surely was meant to be deprecated along with counterparts in Scala/Python? leaving them in would make this inconsistent. As the degrees, radians, and approxCountDistinct are reasonably niche and have a direct replacement that's compatible with older versions, I feel like this is OK for 3.0?

I think it's super light weight to have a approxCountDistinct that calls approx_count_distinct with deprecation?
I thought was that R API was not always sync or complete compare to python, and a breaking API change - ie. the job will fail - seems a bit drastic even in a major release.

It is, but then again that's exactly what was deprecated and removed in Python and Scala. Major versions can have breaking changes. Yes R isn't always in sync but that's a bug not a feature? Let me surface this to dev@ as I think it's going to come up a few more times.

SparkQA · 2018-11-01T14:26:39Z

Test build #98353 has finished for PR 22921 at commit 259e7d1.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2018-11-01T17:43:29Z

core/src/main/scala/org/apache/spark/SparkConf.scala

do we need to remove these? they are warnings for users if they set the wrong config right

Yeah I can add them back. Wasn't sure whether they are still valuable or just old.

rxin · 2018-11-01T17:51:38Z

sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

keep these two lines?

Makes sense, will add it back. Yeah will leave this open a short while to make sure there is time to comment.

rxin · 2018-11-01T17:54:09Z

seems good to me; might want to leave this open for a few days so more people can take a look

SparkQA · 2018-11-01T19:11:36Z

Test build #98354 has finished for PR 22921 at commit 7a0ecd2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-01T22:26:45Z

Test build #98363 has finished for PR 22921 at commit d50b5b5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-02T02:00:25Z

Test build #98373 has finished for PR 22921 at commit bd4f5ab.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-11-02T07:00:48Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

Can we rewrite this test case using select(explode()), like what we did in the following test cases?

Yeah I'll try to bring back to the test case.

gatorsmile · 2018-11-02T07:03:57Z

LGTM except a minor comment about the test case. Also we need to fix the PySpark test failure

holdenk

Thanks for doing this pre Spark 3 cleanup work :)

holdenk · 2018-11-02T16:07:37Z

R/pkg/R/functions.R

I'm confused about the since annotation here, where was the degrees implementation in 2.1.0? When I look at https://spark.apache.org/docs/latest/api/R/index.html I don't see the degrees function just toDegrees>=?

degrees was added in Scala/Python in 2.1.0, which is what I was thinking of, but yeah really this must be since 3.0.0 right? I'll fix it.

yes.. (version here is R API specific)

holdenk · 2018-11-02T16:08:01Z

R/pkg/R/functions.R

Similar comment with degrees

holdenk · 2018-11-02T16:10:40Z

python/pyspark/sql/functions.py

Looks like the removal of this is causing the test failure, maybe do a grep for approxCountDistinct in the tests?

Yeah, in some cases the deprecated user-facing method was named the same way as some internal method and I changed the wrong one. I'll investigate.

holdenk · 2018-11-02T16:11:34Z

python/pyspark/storagelevel.py

cc @MLnick I know this was a thing on your radar in some way for dataframe caching maybe? Do we actually want to remove this for 3+?

SparkQA · 2018-11-03T02:24:25Z

Test build #98414 has finished for PR 22921 at commit 57ef4e8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-11-03T04:37:42Z

Looks okay to me too but I'd also leave this open for few more days.

felixcheung · 2018-11-03T21:12:29Z

R/pkg/R/generics.R

my concern is that these are breaking changes in a version without having them deprecated first...
could we leave the old one to redirect and add .Deprecate?

felixcheung · 2018-11-03T21:13:25Z

R/pkg/R/functions.R

degrees and radians will need to be added to NAMESPACE file for export

felixcheung · 2018-11-03T21:13:59Z

R/pkg/R/functions.R

it's actually new in R for 3.0.0 then

Right, will fix that one too if I missed it, per #22921 (comment)

srowen · 2018-11-03T21:28:44Z

Yeah it's a good point that these weren't deprecated, but I assume they should have been. Same change, same time, same logic. given that it's a reasonably niche method, I thought it would be best to go ahead and be consistent here?

SparkQA · 2018-11-04T01:48:25Z

Test build #98433 has finished for PR 22921 at commit df92f0f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-05T19:28:40Z

Test build #98480 has finished for PR 22921 at commit a6891f7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

@felixcheung ready for another look. I retained the existing methods but deprecated them, and directed them to the newer methods in the JVM, because the old ones are gone. Not sure I got it 100% right.

SparkQA · 2018-11-07T03:24:07Z

Test build #98536 has finished for PR 22921 at commit 6bcbf79.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-07T03:40:17Z

Test build #98537 has finished for PR 22921 at commit af748d5.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2018-11-07T07:36:31Z

R/pkg/R/functions.R

+#' head(select(df, approx_count_distinct(df$gear, 0.02)))
+#' head(select(df, countDistinct(df$gear, df$cyl)))
+#' head(select(df, n_distinct(df$gear)))
+#' head(distinct(select(df, "gear")))}


we only need one set - they both are @rdname column_aggregate_functions so will duplicate all other examples

Thanks, @HyukjinKwon fixed this. Pending tests, does the change look OK to you on the R side @felixcheung ?

Fix CRAN check failure at 22921

felixcheung

R LGTM except 1 comment

felixcheung · 2018-11-07T17:14:04Z

R/pkg/R/functions.R

 #' @aliases toRadians toRadians,Column-method
 #' @note toRadians since 1.4.0
 setMethod("toRadians",
+signature(x = "Column"),


fix indentation?

SparkQA · 2018-11-07T18:27:49Z

Test build #98551 has finished for PR 22921 at commit 3070975.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-07T20:53:05Z

Test build #98561 has finished for PR 22921 at commit 9f1ced3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-08T01:43:53Z

Test build #4419 has finished for PR 22921 at commit 9f1ced3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2018-11-08T04:49:10Z

Merged to master

## What changes were proposed in this pull request? - Remove some AccumulableInfo .apply() methods - Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy - Remove toDegrees/toRadians in favor of degrees/radians (SparkR: only deprecated) - Remove approxCountDistinct in favor of approx_count_distinct (SparkR: only deprecated) - Remove unused Python StorageLevel constants - Remove Dataset unionAll in favor of union - Remove unused multiclass option in libsvm parsing - Remove references to deprecated spark configs like spark.yarn.am.port - Remove TaskContext.isRunningLocally - Remove ShuffleMetrics.shuffle* methods - Remove BaseReadWrite.context in favor of session - Remove Column.!== in favor of =!= - Remove Dataset.explode - Remove Dataset.registerTempTable - Remove SQLContext.getOrCreate, setActive, clearActive, constructors Not touched yet - everything else in MLLib - HiveContext - Anything deprecated more recently than 2.0.0, generally ## How was this patch tested? Existing tests Closes apache#22921 from srowen/SPARK-25908. Lead-authored-by: Sean Owen <sean.owen@databricks.com> Co-authored-by: hyukjinkwon <gurwls223@apache.org> Co-authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

zhengruifeng · 2020-02-26T06:15:32Z

mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala

  override def session(sparkSession: SparkSession): this.type = super.session(sparkSession)
-
-  // override for Java compatibility
-  override def context(sqlContext: SQLContext): this.type = super.session(sqlContext.sparkSession)


@srowen This public method seems had not been deprecated before removal, and is avaiable in 2.4.5.

scala> import org.apache.spark.ml.util.GeneralMLWriter import org.apache.spark.ml.util.GeneralMLWriter scala> new GeneralMLWriter(null).context(spark.sqlContext) res0: org.apache.spark.ml.util.GeneralMLWriter = org.apache.spark.ml.util.GeneralMLWriter@26b150cd

There is no deprecation warning above. Does it matter?

This seems properly deprecated in MLWriter as its parent.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.util.GeneralMLWriter@context(sqlContext:org.apache.spark.sql.SQLContext):GeneralMLWriter.this.type
and the Scaladoc explicitly shows this at GeneralMLWriter too. Seems right to remove together if we should in MLWriter.

Yeah it was deprecated in 2.0.0 and marked for removal in 3.0.0.

/** * Sets the Spark SQLContext to use for saving/loading. * * @deprecated Use session instead. This method will be removed in 3.0.0. */ @Since("1.6.0") @deprecated("Use session instead. This method will be removed in 3.0.0.", "2.0.0")

I think ideally the Java overload and subclass overrides would be marked deprecated too, but they implicitly are. If there were a case that this is actually used, we could revive it, but just wondering how often people would be using save + SQLContext?

I never use this method, just check it. Thanks!

srowen commented Nov 1, 2018

View reviewed changes

rxin reviewed Nov 1, 2018

View reviewed changes

gatorsmile reviewed Nov 2, 2018

View reviewed changes

holdenk reviewed Nov 2, 2018

View reviewed changes

felixcheung reviewed Nov 3, 2018

View reviewed changes

Remove many older deprecated items in Spark 3

a6891f7

srowen force-pushed the SPARK-25908 branch from df92f0f to a6891f7 Compare November 5, 2018 15:07

srowen added 2 commits November 6, 2018 17:07

Add back SparkR methods as deprecated

6bcbf79

Fix spacing

af748d5

srowen commented Nov 6, 2018

View reviewed changes

felixcheung reviewed Nov 7, 2018

View reviewed changes

HyukjinKwon and others added 2 commits November 7, 2018 17:11

Fix CRAN check failure at 22921

78f7ca8

Merge pull request #3 from HyukjinKwon/fix-pr-22921

3070975

Fix CRAN check failure at 22921

felixcheung reviewed Nov 7, 2018

View reviewed changes

Fix indentation

9f1ced3

asfgit closed this in 0025a83 Nov 8, 2018

cloud-fan mentioned this pull request Nov 8, 2018

[SPARK-26030][BUILD] Bump previousSparkVersion in MimaBuild.scala to be 2.4.0 #22977

Closed

srowen deleted the SPARK-25908 branch November 14, 2018 20:54

srowen mentioned this pull request Sep 5, 2019

[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 #25684

Closed

zhengruifeng reviewed Feb 26, 2020

View reviewed changes

Conversation

srowen commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Nov 1, 2018

Uh oh!

SparkQA commented Nov 1, 2018

Uh oh!

SparkQA commented Nov 1, 2018

Uh oh!

SparkQA commented Nov 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Nov 2, 2018

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 3, 2018

Uh oh!

HyukjinKwon commented Nov 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Nov 3, 2018

Uh oh!

SparkQA commented Nov 4, 2018

Uh oh!

SparkQA commented Nov 5, 2018

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

srowen commented Nov 1, 2018 •

edited

Loading

HyukjinKwon Feb 26, 2020 •

edited

Loading